Skip to content

[ENH]: garbage collector v2 orchestrator (supports forking) #4468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 23, 2025

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented May 7, 2025

Description of changes

Adds a new garbage collector orchestrator that supports GC'ing fork trees. The new orchestrator is extensively tested with a proptest. It is not currently used outside of tests.

Copy link

github-actions bot commented May 7, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch 3 times, most recently from c17c75a to 19f703c Compare May 8, 2025 00:13
@codetheweb codetheweb force-pushed the feat-gc-list-files-at-version-operator branch from 0c85b47 to b989fec Compare May 13, 2025 17:35
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch 6 times, most recently from 0e2236f to c6812d0 Compare May 14, 2025 23:18
@codetheweb codetheweb force-pushed the feat-gc-list-files-at-version-operator branch from b989fec to b793a23 Compare May 15, 2025 16:52
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from c6812d0 to 650042e Compare May 15, 2025 16:52
@codetheweb codetheweb force-pushed the feat-gc-list-files-at-version-operator branch from b793a23 to 282d440 Compare May 15, 2025 20:11
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from 650042e to 7394835 Compare May 15, 2025 20:11
@codetheweb codetheweb force-pushed the feat-gc-list-files-at-version-operator branch from 282d440 to c0389ea Compare May 15, 2025 22:38
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from 7394835 to fc6e27b Compare May 15, 2025 22:38
@codetheweb codetheweb force-pushed the feat-gc-list-files-at-version-operator branch from c0389ea to 8b8f1bf Compare May 15, 2025 22:43
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch 2 times, most recently from 4694a76 to 8b4d891 Compare May 15, 2025 23:02
@codetheweb codetheweb force-pushed the feat-gc-list-files-at-version-operator branch from 8b8f1bf to 9853786 Compare May 15, 2025 23:27
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from 8b4d891 to 069241e Compare May 15, 2025 23:27
@codetheweb codetheweb force-pushed the feat-gc-list-files-at-version-operator branch from 9853786 to fc1cd49 Compare May 15, 2025 23:34
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from 069241e to e33a3c4 Compare May 15, 2025 23:34
self.dispatcher.clone()
}

async fn on_start(&mut self, ctx: &ComponentContext<Self>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the right abstraction is - when do you override on_start() v/s set initial_tasks() and the default impl of on_start() takes care of creating the tasks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to override on_start() here because ConstructVersionGraphRequest is not a task. I would be in favor of removing initial_tasks() from the interface.

@codetheweb codetheweb force-pushed the fix-sysdb-allow-v0-deletion branch from 1d4c6d1 to 7d21e46 Compare May 22, 2025 23:24
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from 1451c71 to 53e1a29 Compare May 22, 2025 23:24
Copy link
Contributor

propel-code-bot bot commented May 22, 2025

Garbage Collector V2 Orchestrator with Fork Tree Support and Proptest-Based Validation

This PR introduces a new, extensive, and modularized garbage collector orchestrator (garbage_collector_orchestrator_v2) in Rust that can handle collection version fork trees. Accompanied by a comprehensive property-based testing framework (proptest), the design validates correctness under a wide variety of complex version/fork scenarios. The orchestrator collaborates with modular operator components, and is currently only utilized by the test suite (not active production code), enabling rigorous validation before production rollout.

Key Changes:
• Addition of garbage_collector_orchestrator_v2.rs, a stand-alone GC orchestrator that supports version fork trees across collections.
• Complete module for property-based testing: reference model, arbitrary transition/graph generators, and a state machine test harness.
• Significant supporting test infrastructure: segment file strategies, proptest helpers, and detailed invariants/stats tracking.
• Refinements in library/operator interfaces (notably for delete_unused_files, list_files_at_version) to accommodate new GC orchestrator needs (parameter corrections, clear semantics, vector types, tenant-identification).
• Build and test updates (Cargo.toml, Cargo.lock) to support new dependencies like tracing-subscriber.
• Some refactorings and cleanup in existing operator components for compatibility and testability.

Affected Areas:
• rust/garbage_collector/src/garbage_collector_orchestrator_v2.rs (new orchestrator core)
• rust/garbage_collector/tests/proptest_helpers/* (test infrastructure)
• rust/garbage_collector/src/operators/* (GC-related operators refined)
• rust/garbage_collector/tests/prop_test_garbage_collection.rs (property-based top-level test)
• Cargo.toml/Cargo.lock
• system orchestrator trait defaulting (on_start/initial_tasks signature changes)
• blockstore sparse_index_test_utils

Potential Impact:

Functionality: No production functionality is changed, as GC v2 is not yet wired in; introduces rigorous GC/retention logic for potential future use.

Performance: No impact on runtime; test code is more thorough and possibly slower, but relegated to CI/testing.

Security: No changes.

Scalability: New test harness can validate GC correctness over arbitrarily large and complex version trees, improving confidence in future scalability.

Review Focus:
• Correctness of the garbage collector orchestrator logic for forked version trees.
• Soundness and thoroughness of the proptest state machine and invariants.
• Operator interface changes (delete_unused_files, list_files_at_version): review for breaking changes or backward compatibility.
• Test isolation: no accidental activation of new orchestrator in non-test code.
• Clarity/documentation within orchestrator and test-helper modules.

Testing Needed

• Run the property-based tests (test_k8s_integration_garbage_collection) to validate invariants for varied random workloads.
CI should pass all unit/integration tests to confirm non-regression in operators and orchestrator.
• Manual (optional): review test statistics output to observe coverage and file re-use/fork characteristics.

Code Quality Assessment

rust/garbage_collector/tests/proptest_helpers/garbage_collector_under_test.rs: Complex but logically sound; follows proptest and state machine testing best practices.

rust/garbage_collector/src/operators/delete_unused_files.rs: Interface clarified; parameter types improved; explicit about tenant and input types.

rust/garbage_collector/tests/proptest_helpers/*: High-quality, readable proptest/test harness code.

system/src/execution/orchestrator.rs: Default trait implementations clarified and made safer.

rust/garbage_collector/src/garbage_collector_orchestrator_v2.rs: Well-structured, modular, focused on correctness/lint, clear error boundaries; documentation is robust.

Best Practices

Test Coverage:
• comprehensive invariant checks across generated fork/version trees

Testing:
• property-based testing
• state-machine equivalence
• invariants

Modularity:
• separation of orchestrator from operators
• explicit error handling

Documentation:
• inline docs for orchestrator and testing framework

Potential Issues

• Large test-generated graphs may expose subtle, unanticipated invariants or performance bottlenecks in end-to-end testing.
• Operator interface adjustments (e.g. input structures for file deletion) may require careful updating in production code when GC v2 is activated.

This summary was automatically generated by @propel-code-bot

@codetheweb codetheweb force-pushed the fix-sysdb-allow-v0-deletion branch from 7d21e46 to 3cbfa34 Compare May 22, 2025 23:59
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from 53e1a29 to 8e08e2f Compare May 23, 2025 00:00
@codetheweb codetheweb force-pushed the fix-sysdb-allow-v0-deletion branch from 3cbfa34 to 7d4d78b Compare May 23, 2025 00:47
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from 8e08e2f to de511ea Compare May 23, 2025 00:48
@codetheweb codetheweb force-pushed the fix-sysdb-allow-v0-deletion branch from 7d4d78b to dbab3d2 Compare May 23, 2025 01:25
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from de511ea to c73fd49 Compare May 23, 2025 01:26
@codetheweb codetheweb force-pushed the fix-sysdb-allow-v0-deletion branch 2 times, most recently from 68b21f3 to 8f416dd Compare May 23, 2025 02:07
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from c73fd49 to 3a3adda Compare May 23, 2025 02:07
@codetheweb codetheweb changed the base branch from fix-sysdb-allow-v0-deletion to graphite-base/4468 May 23, 2025 17:10
@codetheweb codetheweb force-pushed the feat-gc-orchestrator-v2 branch from 3a3adda to 7969af3 Compare May 23, 2025 17:10
@codetheweb codetheweb force-pushed the graphite-base/4468 branch from 8f416dd to 038dae8 Compare May 23, 2025 17:10
@graphite-app graphite-app bot changed the base branch from graphite-base/4468 to main May 23, 2025 17:11
@codetheweb codetheweb merged commit 9dec43e into main May 23, 2025
72 checks passed
Copy link
Contributor Author

Merge activity

@codetheweb codetheweb deleted the feat-gc-orchestrator-v2 branch May 23, 2025 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants